skip to main content


Search for: All records

Creators/Authors contains: "MacDonald, Madolyn L."

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract Background

    To select the most complete, continuous, and accurate assembly for an organism of interest, comprehensive quality assessment of assemblies is necessary. We present a novel tool, called Evaluation of De Novo Assemblies (EvalDNA), which uses supervised machine learning for the quality scoring of genome assemblies and does not require an existing reference genome for accuracy assessment.

    Results

    EvalDNA calculates a list of quality metrics from an assembled sequence and applies a model created from supervised machine learning methods to integrate various metrics into a comprehensive quality score. A well-tested, accurate model for scoring mammalian genome sequences is provided as part of EvalDNA. This random forest regression model evaluates an assembled sequence based on continuity, completeness, and accuracy, and was able to explain 86% of the variation in reference-based quality scores within the testing data. EvalDNA was applied to human chromosome 14 assemblies from the GAGE study to rank genome assemblers and to compare EvalDNA to two other quality evaluation tools. In addition, EvalDNA was used to evaluate several genome assemblies of the Chinese hamster genome to help establish a better reference genome for the biopharmaceutical manufacturing community. EvalDNA was also used to assess more recent human assemblies from the QUAST-LG study completed in 2018, and its ability to score bacterial genomes was examined through application on bacterial assemblies from the GAGE-B study.

    Conclusions

    EvalDNA enables scientists to easily identify the best available genome assembly for their organism of interest without requiring a reference assembly. EvalDNA sets itself apart from other quality assessment tools by producing a quality score that enables direct comparison among assemblies from different species.

     
    more » « less
  2. Abstract

    The Chinese hamster genome serves as a reference genome for the study of Chinese hamster ovary (CHO) cells, the preferred host system for biopharmaceutical production. Recent re‐sequencing of the Chinese hamster genome resulted in the RefSeq PICR meta‐assembly, a set of highly accurate scaffolds that filled over 95% of the gaps in previous assembly versions. However, these scaffolds did not reach chromosome‐scale due to the absence of long‐range scaffolding information during the meta‐assembly process. Here, long‐range scaffolding of the PICR Chinese hamster genome assembly was performed using high‐throughput chromosome conformation capture (Hi‐C). This process resulted in a new “PICRH” genome, where 97% of the genome is contained in 11 mega‐scaffolds corresponding to the Chinese hamster chromosomes (2n = 22) and the total number of scaffolds is reduced by three‐fold from 1,830 scaffolds in PICR to 647 in PICRH. Continuity was improved while preserving accuracy, leading to quality scores higher than recent builds of mouse chromosomes and comparable to human chromosomes. The PICRH genome assembly will be an indispensable tool for designing advanced genetic engineering strategies in CHO cells and enabling systematic examination of genomic and epigenomic instability through comparative analysis of CHO cell lines on a common set of chromosomal coordinates.

     
    more » « less
  3. Complete, accurate genome assemblies are necessary to design targets for genetic engineering strategies. Successful gene knockdowns and knockouts in Chinese hamster ovary (CHO) cells may prevent the expression of difficult‐to‐remove host cell proteins (HCPs). HCPs, if not removed, can cause problems in stability, safety, and efficacy of the biotherapeutic. A significantly improved Chinese hamster (CH) reference genome was used to identify new knockout targets with similar predicted functions and characteristics as the difficult‐to‐remove host cell lipases, LPL, PLBL2, and LPLA2. The CHO‐K1 gene and protein sequences of several of these lipases were corrected using the updated CH genome. Sequence alignments were then used to identify conserved regions that may serve as possible targets for multiple simultaneous gene knockouts. Finally, the comparison of the CHO‐K1 lipase protein sequences to their human orthologs provided insight into which lipases, if persistent in the drug product, could possibly cause immunogenic responses in patients. Topical heading: Biomolecular Engineering, Bioengineering, Biochemicals, Biofuels, and Food. © 2018 American Institute of Chemical EngineersAIChE J, 64: 4247–4254, 2018

     
    more » « less